Goto

Collaborating Authors

 voice transcription


Toward Leveraging Pre-Trained Self-Supervised Frontends for Automatic Singing Voice Understanding Tasks: Three Case Studies

Yamamoto, Yuya

arXiv.org Artificial Intelligence

Automatic singing voice understanding tasks, such as singer identification, singing voice transcription, and singing technique classification, benefit from data-driven approaches that utilize deep learning techniques. These approaches work well even under the rich diversity of vocal and noisy samples owing to their representation ability. However, the limited availability of labeled data remains a significant obstacle to achieving satisfactory performance. In recent years, self-supervised learning models (SSL models) have been trained using large amounts of unlabeled data in the field of speech processing and music classification. By fine-tuning these models for the target tasks, comparable performance to conventional supervised learning can be achieved with limited training data. Therefore, in this paper, we investigate the effectiveness of SSL models for various singing voice recognition tasks. We report the results of experiments comparing SSL models for three different tasks (i.e., singer identification, singing voice transcription, and singing technique classification) as initial exploration and aim to discuss these findings. Experimental results show that each SSL model achieves comparable performance and sometimes outperforms compared to state-of-the-art methods on each task. We also conducted a layer-wise analysis to further understand the behavior of the SSL models.


How Artificial Intelligence Is Taking Over Our Gadgets

#artificialintelligence

If you think of AI as something futuristic and abstract, start thinking different. We're now witnessing a turning point for artificial intelligence, as more of it comes down from the clouds and into our smartphones and automobiles. While it's fair to say that AI that lives on the "edge" -- where you and I are -- is still far less powerful than its datacenter-based counterpart, it's potentially far more meaningful to our everyday lives. One key example: This fall, Apple's Siri assistant will start processing voice on iPhones. Right now, even your request to set a timer is sent as an audio recording to the cloud, where it is processed, triggering a response that's sent back to the phone.


How AI Is Taking Over Our Gadgets

#artificialintelligence

One key example: This fall, Apple's Siri assistant will start processing voice on iPhones. Right now, even your request to set a timer is sent as an audio recording to the cloud, where it is processed, triggering a response that's sent back to the phone. By processing voice on the phone, says Apple, Siri will respond more quickly. This will only work on the iPhone XS and newer models, which have a compatible built-for-AI processor Apple calls a "neural engine." People might also feel more secure knowing that their voice recordings aren't being sent to unseen computers in faraway places.


Digitizing Voice; A Great Source for Organizations to Tap

#artificialintelligence

Organizations need to take every advantage that their data mesh affords. A truly underutilized data source is digitized speech turned into actions based on reusable and leveraged voice data. Savvy organizations are learning to rely on machine learning (ML) combined with natural language processing (NLP) to quickly and accurately take advantage of voice transcriptions for organizational leverage and business advantage. What are the major benefit streams for organizations to tap? What are some essential functions to look for in a voice-based vendor?


AI for Voice Transcription: Is It Here to Last?

#artificialintelligence

AI is one of the driving forces behind what The World Economic Forum called "The Fourth Industrial Revolution". Developments in this area are expected to help us further automate our workflows and simplify our daily tasks, making everything from our food production chains to management and even medical procedures, far more effective and agile. And, according to PwC, AI is expected to add up to 15.6 trillion dollars to the world economy by 2030. AI is getting smarter faster than ever, with established players, such as Google or Amazon developing and integrating AI into their products and operations, and a generation of startups from all around the globe, developing and offering AI-based tools. One of the main areas that AI is starting to be used in, is transcription services.


Why AI and Humans Are Stronger Together Than Apart

#artificialintelligence

While artificial intelligence (AI) is radically altering how work gets done and who does it, the technology's larger impact will be in augmenting human capabilities, not replacing them. In fact, a report from Harvard Business Review found that firms achieve the most significant performance improvements when humans and machines work together. After all, what comes naturally to people--interpersonal communication, for example--can be tricky for AI, while simple AI tasks like transcribing data remains challenging for humans. AI and humans should work together to double check for errors and help augment each others' capabilities. By integrating human talents and AI-driven functions, companies across industries can reap the benefits of AI.